4  Long Short-Term Memory (LSTM)

LSTM networks are deep learning neural network. They use a gated structure to store information for extended periods using use memory cells. Guo et al. (2024). These memory cells include gates that regulate the information flow within the network. The structure of a memory cell includes four components: the forget gate, the input gate, the output gate and a cell state. (Rhif et al. 2020). - The forget gate decides which information from the previous cell state should be discarded. - The new information to be added to the cell state is determined by the input gate. - The cell state is updated by combining the previous cell state information from the forget gate and the new input gate information (Sherif et al. 2023). - What the current cell state will output is decided by the output gate. Thus, the network can maintain and update its internal states to learn which information to retain and which to forget Sherif et al. (2023).

4.1 Overview

  • Loading the necessary packages
  • Defining the base directory
  • Initially, a parameter search is conducted using GridSearch (0. Gridsearch). During this process, data for a cube is loaded and prepared to enable the model to determine the optimal parameters.
  • Subsequently, the code for the final LSTM model is executed:
    • Load and normalize the data for all utilized cubes
    • Create sequences for the model
    • Prepare the data for model training: flatten, reshape into the correct dimensions, combine data from all cubes, and mask out NaN values
    • Construct the model
    • Train the model
    • Generate predictions for the test dataset
    • Denormalize the predictions
    • Save the predictions

4.2 Requirements

The notebook was build using GPU ressources and the Kernel Python 3.9 TensorFlow 2.6.6 CUDA.

4.2.1 Load packages

import os
import xarray as xr
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Masking, Dropout, Input, BatchNormalization
from tensorflow.keras.optimizers import Adam, RMSprop, SGD
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import ParameterGrid
from tensorflow.keras.models import load_model
from pathlib import Path
import glob
import netCDF4 as nc

4.2.2 Define base directory

# Define base_dir for consistent path management
notebook_dir = Path(os.getcwd()).resolve()
base_dir = notebook_dir
print(base_dir)
/home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe

4.3 0. GridSearch

To find the best parameters for the model with the given data, we performed a Gridsearch. Unfortunately, the computational capacity was insufficient to conduct the Gridsearch for all 4 cubes. Thus, we only used one cube as data basis (Cube 665). In the following, the data of the cube is loaded and prepared for the model. It was then tested with all parameter combinations. The best parameters are presented as output.

def load_data(train_file, test_file):
    """
    Load and extract NDVI data from .nc files.

    Parameters:
    train_file (str): Path to the training data file.
    test_file (str): Path to the testing data file.

    Returns:
    tuple: Training and test NDVI data.
    """
    ds_train = xr.open_dataset(train_file)
    ndvi_train = ds_train['NDVI'].values
    ds_test = xr.open_dataset(test_file)
    ndvi_test = ds_test['NDVI'].values
    return ndvi_train, ndvi_test

def prepare_data(ndvi_train, ndvi_test, sequence_length, pred_length):
    """
    Normalize and prepare data for LSTM model training.

    Parameters:
    ndvi_train (np.ndarray): Training NDVI data.
    ndvi_test (np.ndarray): Test NDVI data.
    sequence_length (int): Length of input sequences.
    pred_length (int): Length of prediction sequences.

    Returns:
    tuple: Prepared training and test data, input shape, and output units.
    """
    scaler = MinMaxScaler(feature_range=(0, 1))
    ndvi_train_normalized = scaler.fit_transform(ndvi_train.reshape(-1, 1)).reshape(ndvi_train.shape)
    ndvi_test_normalized = scaler.transform(ndvi_test.reshape(-1, 1)).reshape(ndvi_test.shape)

    def create_sequences(data, seq_length, pred_length):
        X, y = [], []
        for i in range(len(data) - seq_length - pred_length + 1):
            X.append(data[i:i + seq_length])
            y.append(data[i + seq_length:i + seq_length + pred_length])
        return np.array(X), np.array(y)

    X_train, Y_train = create_sequences(ndvi_train_normalized, sequence_length, pred_length)
    X_test, Y_test = create_sequences(ndvi_test_normalized, sequence_length, pred_length)

    X_train_flattened = X_train.reshape(X_train.shape[0], X_train.shape[1], -1)
    Y_train_flattened = Y_train.reshape(Y_train.shape[0], Y_train.shape[1], -1)
    X_test_flattened = X_test.reshape(X_test.shape[0], X_test.shape[1], -1)
    Y_test_flattened = Y_test.reshape(Y_test.shape[0], Y_test.shape[1], -1)

    X_train_flattened = np.nan_to_num(X_train_flattened, nan=0)
    Y_train_flattened = np.nan_to_num(Y_train_flattened, nan=0)
    X_test_flattened = np.nan_to_num(X_test_flattened, nan=0)
    Y_test_flattened = np.nan_to_num(Y_test_flattened, nan=0)

    input_shape = (X_train_flattened.shape[1], X_train_flattened.shape[2])
    output_units = Y_train_flattened.shape[1] * Y_train_flattened.shape[2]

    return X_train_flattened, Y_train_flattened, X_test_flattened, Y_test_flattened, input_shape, output_units

def build_model(input_shape, output_units, params):
    """
    Build and compile the LSTM model.

    Parameters:
    input_shape (tuple): Shape of the input data.
    output_units (int): Number of output units.
    params (dict): Dictionary containing model parameters.

    Returns:
    Sequential: Compiled LSTM model.
    """
    lstm_units = params['lstm_units']
    dropout_rate = params['dropout_rate']
    learning_rate = params['learning_rate']
    optimizer_name = params['optimizer']
    use_batch_norm = params['use_batch_norm']

    if optimizer_name == 'adam':
        optimizer = Adam(learning_rate=learning_rate)
    elif optimizer_name == 'rmsprop':
        optimizer = RMSprop(learning_rate=learning_rate)
    else:
        optimizer = SGD(learning_rate=learning_rate)

    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(Masking(mask_value=0))
    model.add(LSTM(lstm_units, return_sequences=True))
    if use_batch_norm:
        model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(LSTM(lstm_units))
    if use_batch_norm:
        model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(output_units))
    model.compile(optimizer=optimizer, loss='mse')
    return model

def train_and_evaluate(params, input_shape, output_units, X_train, Y_train, X_test, Y_test):
    """
    Train and evaluate the LSTM model based on provided parameters.

    Parameters:
    params (dict): Dictionary containing model parameters.
    input_shape (tuple): Shape of the input data.
    output_units (int): Number of output units.
    X_train (np.ndarray): Training input data.
    Y_train (np.ndarray): Training target data.
    X_test (np.ndarray): Test input data.
    Y_test (np.ndarray): Test target data.

    Returns:
    dict: Training result including parameters and loss.
    """
    model = build_model(input_shape, output_units, params)
    early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
    history = model.fit(X_train, Y_train.reshape(Y_train.shape[0], -1), epochs=params['epochs'], batch_size=params['batch_size'], validation_split=0.2, callbacks=[early_stopping], verbose=0)
    loss = model.evaluate(X_test, Y_test.reshape(Y_test.shape[0], -1), verbose=0)
    
    result = {
        'lstm_units': params['lstm_units'],
        'dropout_rate': params['dropout_rate'],
        'batch_size': params['batch_size'],
        'epochs': params['epochs'],
        'learning_rate': params['learning_rate'],
        'optimizer': params['optimizer'],
        'use_batch_norm': params['use_batch_norm'],
        'loss': loss
    }

    return result

# Paths to training and test data
train_file = base_dir / 'data/data_interpolated/ds_B_Cube_665_train.nc'
test_file = base_dir / 'data/data_test/Cube_665_test.nc'

# Example sequence and prediction lengths
sequence_length = 70
pred_length = 23

# Load and prepare data
ndvi_train, ndvi_test = load_data(train_file, test_file)
X_train, Y_train, X_test, Y_test, input_shape, output_units = prepare_data(ndvi_train, ndvi_test, sequence_length, pred_length)

# Define the parameter grid
param_grid = {
    'lstm_units': [50, 100],
    'dropout_rate': [0.2, 0.3],
    'batch_size': [16, 32],
    'epochs': [30, 50],
    'learning_rate': [0.001, 0.01],
    'optimizer': ['adam', 'rmsprop'],
    'use_batch_norm': [True, False]
}

# Initialize results list
results = []

# Create grid
grid = ParameterGrid(param_grid)

# Loop through all parameter combinations
for params in grid:
    result = train_and_evaluate(params, input_shape, output_units, X_train, Y_train, X_test, Y_test)
    results.append(result)

# Create DataFrame from results
results_df = pd.DataFrame(results)
print(results_df)

# Find the best parameter combination
best_params = results_df.loc[results_df['loss'].idxmin()]

print("Best Parameters:")
print(best_params)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_665_train.nc with shape: (292, 128, 128)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_1203_train.nc with shape: (292, 128, 128)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_80_train.nc with shape: (292, 128, 128)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_1301_train.nc with shape: (292, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_1203_test.nc with shape: (93, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_80_test.nc with shape: (93, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_1301_test.nc with shape: (93, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_665_test.nc with shape: (93, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized test data shape: (93, 128, 128)
Normalized test data shape: (93, 128, 128)
Normalized test data shape: (93, 128, 128)
Normalized test data shape: (93, 128, 128)

The result is a list with the best values for each parameter examined. These are used for the final model.

5 LSTM Model

5.1 1. Load and normalize the data

First, all Files from the defined directory are loaded.

Data needs to be normalized for an LSTM to ensure that the network converges faster during training. This avoids issues related to vanishing or exploding gradients. Normalization also helps to ensure that all input features contribute equally to the learning process to improve the model’s performance and stability.

def load_nc_file(file_path):
    """
    Load a single .nc file.

    Parameters:
    file_path (str): Path to the .nc file.

    Returns:
    xarray.Dataset: Loaded dataset.
    """
    return xr.open_dataset(file_path)

def extract_ndvi(ds):
    """
    Extract NDVI data from the dataset.

    Parameters:
    ds (xarray.Dataset): Dataset containing NDVI data.

    Returns:
    np.ndarray: Extracted NDVI values.
    """
    return ds['NDVI'].values

# Example: Path to the data
train_dir = base_dir / 'data/data_interpolated/'
test_dir = base_dir / 'data/data_test/'

# List of training and test files
train_files = glob.glob(str(train_dir / '*.nc'))
test_files = glob.glob(str(test_dir / '*.nc'))

# Initialize training and test data lists
ndvi_train_list = []
ndvi_test_list = []

# Load training data
for file in train_files:
    ds = load_nc_file(file)
    ndvi_train = extract_ndvi(ds)
    ndvi_train_list.append(ndvi_train)
    print(f"Loaded train file: {file} with shape: {ndvi_train.shape}")

# Load test data
for file in test_files:
    ds = load_nc_file(file)
    ndvi_test = extract_ndvi(ds)
    ndvi_test_list.append(ndvi_test)
    print(f"Loaded test file: {file} with shape: {ndvi_test.shape}")

def normalize_data(ndvi_data):
    """
    Normalize the NDVI data using MinMaxScaler.

    Parameters:
    ndvi_data (np.ndarray): NDVI data to be normalized.

    Returns:
    np.ndarray: Normalized NDVI data.
    MinMaxScaler: Scaler used for normalization.
    """
    scaler = MinMaxScaler(feature_range=(0, 1))
    flattened_data = ndvi_data.reshape(-1, 1)
    normalized_data = scaler.fit_transform(flattened_data).reshape(ndvi_data.shape)
    return normalized_data, scaler

# Normalize training and test data
ndvi_train_normalized_list = []
ndvi_test_normalized_list = []
scalers = []  # List of scalers for each file

for ndvi_train in ndvi_train_list:
    normalized_data, scaler = normalize_data(ndvi_train)
    ndvi_train_normalized_list.append(normalized_data)
    scalers.append(scaler)
    print(f"Normalized train data shape: {normalized_data.shape}")

for ndvi_test in ndvi_test_list:
    normalized_data, scaler = normalize_data(ndvi_test)
    ndvi_test_normalized_list.append(normalized_data)
    print(f"Normalized test data shape: {normalized_data.shape}")
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_665_train.nc with shape: (292, 128, 128)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_1203_train.nc with shape: (292, 128, 128)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_80_train.nc with shape: (292, 128, 128)
Loaded train file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_interpolated/ds_B_Cube_1301_train.nc with shape: (292, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_1203_test.nc with shape: (93, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_80_test.nc with shape: (93, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_1301_test.nc with shape: (93, 128, 128)
Loaded test file: /home/sc.uni-leipzig.de/kt501gqiy/_LIM_lectures/2024_SoSe/data/data_test/Cube_665_test.nc with shape: (93, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized train data shape: (292, 128, 128)
Normalized test data shape: (93, 128, 128)
Normalized test data shape: (93, 128, 128)
Normalized test data shape: (93, 128, 128)
Normalized test data shape: (93, 128, 128)

5.2 2. Data preparation

The following is carried out here: - Creation of the sequences that the model needs for training: For seasonal data, the sequence length should typically capture at least one full seasonal cycle. This allows the model to learn the repeating patterns and trends effectively. Thus, we decided to set the sequence length to 70 (365 days divided by 5 days temporal resolution = 73). - Flattening the data and dividing it into sequences: Flattening and reshaping ensure the data is compatible with LSTM input requirements: - The variable X should be a 3D array of shape (num_samples, time_steps, num_features). - num_samples: Number of training samples. - time_steps: Length of the sequence to be fed into the LSTM. - num_features: Number of features at each time step. - The variable Y should be a 2D array of shape (num_samples, num_targets). - num_targets: Number of targets to predict - Combination of the data from all cubes - Masking nan values: LSTMs cannot handle NaN values because they disrupt the computations required for training. They can lead to invalid loss values and prevent the network from learning properly. - Reshape the combined data arrays to the required dimensions

sequence_length = 70
pred_length = 23

def create_sequences(data, seq_length, pred_length):
    """
    Create sequences and targets for the model.

    Parameters:
    data (np.ndarray): Input data for sequence creation.
    seq_length (int): Length of the input sequences.
    pred_length (int): Length of the prediction sequences.

    Returns:
    np.ndarray: Array of input sequences.
    np.ndarray: Array of target sequences.
    """
    X, y = [], []
    for i in range(len(data) - seq_length - pred_length + 1):
        X.append(data[i:i + seq_length])
        y.append(data[i + seq_length:i + seq_length + pred_length])
    return np.array(X), np.array(y)

def flatten_data(data):
    """
    Flattens the height and width dimensions into a single feature dimension.
    
    Args:
        data (numpy.ndarray): The data to be flattened.

    Returns:
        numpy.ndarray: The flattened data.
    """
    return data.reshape(data.shape[0], data.shape[1], -1)

def prepare_data(ndvi_data_list, sequence_length, pred_length):
    """
    Prepares the data for model training or testing with sequences.
    
    Args:
        ndvi_data_list (list): List of NDVI data arrays.
        sequence_length (int): The length of the sequence.
        pred_length (int): The length of the prediction.

    Returns:
        tuple: Flattened input and target sequences.
    """
    X_list, Y_list = [], []
    for ndvi_data in ndvi_data_list:
        X, Y = create_sequences(ndvi_data, sequence_length, pred_length)
        X_list.append(flatten_data(X))
        Y_list.append(flatten_data(Y))
    return X_list, Y_list

def combine_data(X_list, Y_list):
    """
    Combines and flattens the list of data arrays into single arrays.
    
    Args:
        X_list (list): List of input sequences.
        Y_list (list): List of target sequences.

    Returns:
        tuple: Combined and flattened input and target sequences.
    """
    X_combined = np.concatenate(X_list, axis=0)
    Y_combined = np.concatenate(Y_list, axis=0)
    return X_combined, Y_combined

def check_nan(data, name):
    """
    Checks for NaN values in the data.
    
    Args:
        data (numpy.ndarray): The data to be checked.
        name (str): The name of the data for printing.

    Returns:
        None
    """
    nan_in_data = np.any(np.isnan(data))
    print(f"NaN in {name}: {nan_in_data}")

def reshape_combined_data(X, Y, sequence_length):
    """
    Reshapes the combined data arrays to the required dimensions.
    
    Args:
        X (numpy.ndarray): Combined input sequences.
        Y (numpy.ndarray): Combined target sequences.
        sequence_length (int): The length of the sequence.

    Returns:
        tuple: Reshaped input and target sequences.
    """
    X = X.reshape((X.shape[0], sequence_length, -1))
    Y = Y.reshape((Y.shape[0], -1))
    return X, Y

# Prepare training data
X_train_list, Y_train_list = prepare_data(ndvi_train_normalized_list, sequence_length, pred_length)

# Prepare testing data
X_test_list, Y_test_list = prepare_data(ndvi_test_normalized_list, sequence_length, pred_length)

# Combine training data
X_train_combined, Y_train_combined = combine_data(X_train_list, Y_train_list)

# Combine testing data
X_test_combined, Y_test_combined = combine_data(X_test_list, Y_test_list)

# Replace NaNs with zeros
X_train_combined = np.nan_to_num(X_train_combined, nan=0)
Y_train_combined = np.nan_to_num(Y_train_combined, nan=0)
X_test_combined = np.nan_to_num(X_test_combined, nan=0)
Y_test_combined = np.nan_to_num(Y_test_combined, nan=0)

# Check for NaNs in the data
check_nan(X_train_combined, "X_train_combined")
check_nan(Y_train_combined, "Y_train_combined")
check_nan(X_test_combined, "X_test_combined")
check_nan(Y_test_combined, "Y_test_combined")

# Reshape combined data
X_train_combined, Y_train_combined = reshape_combined_data(X_train_combined, Y_train_combined, sequence_length)
X_test_combined, Y_test_combined = reshape_combined_data(X_test_combined, Y_test_combined, sequence_length)

# Print final shapes of combined data
print(f"Final X_train_combined shape: {X_train_combined.shape}")
print(f"Final Y_train_combined shape: {Y_train_combined.shape}")
print(f"Final X_test_combined shape: {X_test_combined.shape}")
print(f"Final Y_test_combined shape: {Y_test_combined.shape}")
NaN in X_train_combined: False
NaN in Y_train_combined: False
NaN in X_test_combined: False
NaN in Y_test_combined: False
Final X_train_combined shape: (800, 70, 16384)
Final Y_train_combined shape: (800, 376832)
Final X_test_combined shape: (4, 70, 16384)
Final Y_test_combined shape: (4, 376832)

5.3 3. Create Model

Here the model is created and compiled. For the model are used four cubes (see 1_Data_Preprocessing). After GridSearch, the best hyperparameters are set to the model. A callback for Early stopping is also added: Early stopping is a regularization technique used during training of neural networks to prevent overfitting

# Hyperparameters
lstm_units = 50
dropout_rate = 0.2
batch_size = 32
epochs = 50
learning_rate = 0.001
optimizer = 'rmsprop'
use_batch_norm = True

# Define input and output shapes
input_shape = (X_train_combined.shape[1], X_train_combined.shape[2])  # (n_steps, num_pixels * num_features)
output_units = Y_train_combined.shape[1]  # num_pixels * num_features

# Model creation
def create_lstm_model(input_shape, output_units, lstm_units, dropout_rate, use_batch_norm, learning_rate):
    """
    Creates and compiles an LSTM model with the specified parameters.
    
    Args:
        input_shape (tuple): Shape of the input data (n_steps, num_pixels * num_features).
        output_units (int): Number of output units (num_pixels * num_features).
        lstm_units (int): Number of LSTM units.
        dropout_rate (float): Dropout rate for regularization.
        use_batch_norm (bool): Whether to use batch normalization.
        learning_rate (float): Learning rate for the optimizer.

    Returns:
        model (Sequential): Compiled LSTM model.
    """
    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(Masking(mask_value=0))
    model.add(LSTM(lstm_units, return_sequences=True))
    if use_batch_norm:
        model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(LSTM(lstm_units))
    if use_batch_norm:
        model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(output_units))

    # Create optimizer instance
    optimizer_instance = RMSprop(learning_rate=learning_rate)

    # Compile model
    model.compile(optimizer=optimizer_instance, loss='mse')

    return model

# Create the LSTM model
model = create_lstm_model(input_shape, output_units, lstm_units, dropout_rate, use_batch_norm, learning_rate)

# Print the model summary
model.summary()

# Callback for Early Stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
2024-07-28 15:33:01.781971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-28 15:33:01.982418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory:  -> device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:05:00.0, compute capability: 7.0
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
masking (Masking)            (None, 70, 16384)         0         
_________________________________________________________________
lstm (LSTM)                  (None, 70, 50)            3287000   
_________________________________________________________________
batch_normalization (BatchNo (None, 70, 50)            200       
_________________________________________________________________
dropout (Dropout)            (None, 70, 50)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 50)                20200     
_________________________________________________________________
batch_normalization_1 (Batch (None, 50)                200       
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense (Dense)                (None, 376832)            19218432  
=================================================================
Total params: 22,526,032
Trainable params: 22,525,832
Non-trainable params: 200
_________________________________________________________________

5.4 4. Model training

The code trains the LSTM model using the combined training data (X_train_combined and Y_train_combined) with early stopping to halt training if performance on the validation split (20% of the training data) stops improving. After training, it evaluates the model’s performance and prints the test loss.

# Train the model
history = model.fit(X_train_combined, Y_train_combined, epochs=epochs, batch_size=batch_size, validation_split=0.2, callbacks=[early_stopping], verbose=1)

# Evaluation
loss = model.evaluate(X_test_combined, Y_test_combined, verbose=1)
print(f"Test loss: {loss}")

# Save the model
# model.save('LSTM.h5')
2024-07-28 15:33:18.306972: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/50
2024-07-28 15:33:23.339418: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8401
20/20 [==============================] - 10s 204ms/step - loss: 0.3286 - val_loss: 0.3497
Epoch 2/50
20/20 [==============================] - 1s 67ms/step - loss: 0.2917 - val_loss: 0.3134
Epoch 3/50
20/20 [==============================] - 1s 68ms/step - loss: 0.2443 - val_loss: 0.2575
Epoch 4/50
20/20 [==============================] - 1s 67ms/step - loss: 0.1949 - val_loss: 0.2546
Epoch 5/50
20/20 [==============================] - 1s 67ms/step - loss: 0.1454 - val_loss: 0.1960
Epoch 6/50
20/20 [==============================] - 1s 67ms/step - loss: 0.0958 - val_loss: 0.1884
Epoch 7/50
20/20 [==============================] - 1s 68ms/step - loss: 0.0634 - val_loss: 0.1391
Epoch 8/50
20/20 [==============================] - 1s 68ms/step - loss: 0.0467 - val_loss: 0.1035
Epoch 9/50
20/20 [==============================] - 1s 68ms/step - loss: 0.0398 - val_loss: 0.0580
Epoch 10/50
20/20 [==============================] - 1s 68ms/step - loss: 0.0356 - val_loss: 0.0411
Epoch 11/50
20/20 [==============================] - 1s 67ms/step - loss: 0.0343 - val_loss: 0.0530
Epoch 12/50
20/20 [==============================] - 1s 67ms/step - loss: 0.0312 - val_loss: 0.0555
Epoch 13/50
20/20 [==============================] - 1s 67ms/step - loss: 0.0337 - val_loss: 0.0460
Epoch 14/50
20/20 [==============================] - 1s 67ms/step - loss: 0.0347 - val_loss: 0.0447
Epoch 15/50
20/20 [==============================] - 1s 67ms/step - loss: 0.0313 - val_loss: 0.0450
1/1 [==============================] - 2s 2s/step - loss: 0.0513
Test loss: 0.051280610263347626

5.5 5. Prediction and Denormalization

After the prediction is made, the results are denormalized using the inverse transformation of the scaler to revert them back to their original scale. This ensures that the test data and the model’s predictions are in the same scale as the original dataset for accurate comparison.

# Load the model
# model = load_model('LSTM.h5')

# Make predictions
predictions = model.predict(X_test_combined)

# Denormalize the predictions
def denormalize_data(scaler, normalized_data):
    return scaler.inverse_transform(normalized_data.reshape(-1, 1)).reshape(normalized_data.shape)

# Ensuring that the test data and the predictions are transformed back to their original scale
last_scaler = scalers[-1]
Y_test_denormalized = denormalize_data(last_scaler, Y_test_combined)
predictions_denormalized = denormalize_data(last_scaler, predictions)

5.6 6. Save predictions

Here, the dimensions and time steps are configured. The mean squared error (MSE) between true and predicted values is calculated. Finally, the reshaped predicted values along with the MSE were saved into NetCDF files.

# Configuration
time_steps = 23
x_dim = 128
y_dim = 128

output_dir = 'data/test_predictions'

# Generate time stamps
time_stamps = pd.date_range(start='2021-07-03', periods=time_steps, freq='5D')

# Mapping of file names to test samples
file_to_sample = {}

# Names of test files corresponding to the predictions
test_file_names = [os.path.basename(f) for f in test_files]

# Reshape combined data back to the original form
Y_test_combined = Y_test_combined.reshape((Y_test_combined.shape[0], time_steps, x_dim, y_dim))

# Calculate MSE values
mse_list = []
for i in range(len(test_files)):
    true_values = Y_test_combined[i].reshape(-1)
    predicted_values = predictions_denormalized[i].reshape(-1)
    mse = mean_squared_error(true_values, predicted_values)
    mse_list.append(mse)

# Extract test samples and map to file names
for i, file_name in enumerate(test_file_names):
    sample = predictions_denormalized[i].reshape(time_steps, x_dim, y_dim)
    mse = mse_list[i]  # MSE value for the current file
    
    # Create output file path
    output_file = os.path.join(output_dir, 'LSTM_predicted_' + file_name)
    
    with nc.Dataset(output_file, 'w', format='NETCDF4') as ds:
        ds.createDimension('time', time_steps)
        ds.createDimension('x', x_dim)
        ds.createDimension('y', y_dim)
        
        # Save time coordinate as Datetime64
        time = ds.createVariable('time', 'f8', ('time',))
        time.units = 'days since 1970-01-01 00:00:00'
        time.calendar = 'gregorian'
        time[:] = nc.date2num(time_stamps.to_pydatetime(), units=time.units, calendar=time.calendar)
        
        # Save X and Y coordinates as integers
        x = ds.createVariable('x', 'i4', ('x',))
        y = ds.createVariable('y', 'i4', ('y',))
        ndvi = ds.createVariable('NDVI', 'f4', ('time', 'x', 'y'))
        mse_var = ds.createVariable('MSE', 'f4')  # Variable for the MSE value
        
        x[:] = np.arange(x_dim)
        y[:] = np.arange(y_dim)
        
        ndvi[:, :, :] = sample
        mse_var.assignValue(mse)  # Save MSE value
        
    # Save mapping
    file_to_sample[file_name] = output_file

# Check mappings and MSE values
print(file_to_sample)
print("MSE values:", mse_list)
{'Cube_1203_test.nc': 'data/test_predictions/LSTM_predicted_Cube_1203_test.nc', 'Cube_80_test.nc': 'data/test_predictions/LSTM_predicted_Cube_80_test.nc', 'Cube_1301_test.nc': 'data/test_predictions/LSTM_predicted_Cube_1301_test.nc', 'Cube_665_test.nc': 'data/test_predictions/LSTM_predicted_Cube_665_test.nc'}
MSE values: [0.092216894, 0.20980994, 0.29330707, 0.13620757]